Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More precise newsflash coordinates (using Waze data) #1552

Draft
wants to merge 10 commits into
base: dev
Choose a base branch
from

Conversation

sahare92
Copy link

Added a simple logic to find the related waze accident alert using information from the newsflash
(Currently it selects a waze accident that occurred around the area of the newsflash accident , up to WAZE_ALERT_NEWSFLASH_DELTA_IN_HOURS hours before the newsflash's time.

It currently only populates the newsflash.waze_alert property, without modifying the lat/lon attributes according to the waze accident (this should be updated after we test it manually, and make sure it indeed improves the precision)

@codecov
Copy link

codecov bot commented Oct 20, 2020

Codecov Report

Merging #1552 into dev will increase coverage by 0.97%.
The diff coverage is 28.20%.

Impacted file tree graph

@@            Coverage Diff             @@
##              dev    #1552      +/-   ##
==========================================
+ Coverage   50.38%   51.35%   +0.97%     
==========================================
  Files          62       62              
  Lines        5724     6048     +324     
==========================================
+ Hits         2884     3106     +222     
- Misses       2840     2942     +102     
Flag Coverage Δ
unittests 51.35% <28.20%> (+0.97%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
anyway/parsers/injured_around_schools.py 0.00% <0.00%> (ø)
anyway/parsers/utils.py 10.00% <12.50%> (+10.00%) ⬆️
anyway/parsers/location_extraction.py 26.88% <37.50%> (-41.91%) ⬇️
tests/test_news_flash.py 90.90% <50.00%> (-6.29%) ⬇️
anyway/parsers/__init__.py 100.00% <100.00%> (ø)
anyway/parsers/news_flash_db_adapter.py 60.86% <0.00%> (-8.70%) ⬇️
tests/test_infographic_api.py 50.71% <0.00%> (+0.35%) ⬆️
anyway/infographics_utils.py 87.21% <0.00%> (+6.67%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0fd4f86...c9ef932. Read the comment docs.

@atalyaalon atalyaalon requested review from elazarg and Mano3 October 20, 2020 19:49
Copy link

@elazarg elazarg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! I have added some comments, but they are mostly minor suggestions.

I am not entirely sure the best way of adding SQL rows for tests is using this "add, commit; use; delete, commit" pattern; I'm not an expert, but I think it could be done using something like "add; use; rollback" pattern. Maybe @ziv17 would like to comment on this.

anyway/parsers/location_extraction.py Show resolved Hide resolved
anyway/parsers/utils.py Show resolved Hide resolved
anyway/parsers/utils.py Outdated Show resolved Hide resolved
tests/test_news_flash.py Outdated Show resolved Hide resolved
@ziv17
Copy link
Collaborator

ziv17 commented Oct 21, 2020

You can mock the db to return what you need for the test. Using mock you get much faster tests, and it is easier to test for boundary conditions. The tests can run the tests directly from the IDE.

@sahare92
Copy link
Author

@ziv17 I think there's a benefit to run the query on the db (instead of mocking), because this way we can test that the generated query is working, and actually returns the expected WazeAlert from the db.
but I can mock the db response if you think it'll be better. WDYT?

@elazarg
Copy link

elazarg commented Oct 21, 2020

I agree with Ziv. Mocking is generally better, and it's how other tests are written. If you want to test querying the DB, that would be a different test.

_delete_waze_alert(waze_alert.id)


def _create_waze_accident_alert():
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that you're using the context manager, there's no need to write these two functions separately. Just inline them.

.filter(WazeAlert.alert_type == "ACCIDENT")
.filter(
WazeAlert.created_at.between(
newsflash.date - timedelta(hours=WAZE_ALERT_NEWSFLASH_DELTA_IN_HOURS),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this constant is only used inside the function, it's should probably be local constant. This way the occasional reader doesn't have to wonder where else it is used.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, it can be a timedelta, making it slightly more type-safe, and making this expression a single line.

@ziv17
Copy link
Collaborator

ziv17 commented Oct 21, 2020

@ziv17 I think there's a benefit to run the query on the db (instead of mocking), because this way we can test that the generated query is working, and actually returns the expected WazeAlert from the db.
but I can mock the db response if you think it'll be better. WDYT?

We do need both end-to-end tests and tests to cover the logic. Covering the logic with end-to-end will be slow, unstable, and cumbersome. The best practice is small amount of end-to-end and and reach coverage with unit tests (that use mocking)

@Mano3
Copy link
Collaborator

Mano3 commented Oct 24, 2020

Hi @sahare92 , great work!
I'm mostly concerned about the algorithmic aspect.
How were the distances per resolution picked?
How was the time delta picked?
Why taking the first waze alert in the bounding box instead of the closest?
Also, there is no need to query Waze alert from the news flash datetime minus delta until now. News flash will probably appear after the event is already finished so at most it should query up until news flash datetime + delta.
Was this logic checked on a large chunk of existing news flash?
I wish to know some statistics and results of manual review of how this logic fares on a large amount of data before merging.
Thanks.

@sahare92
Copy link
Author

sahare92 commented Nov 4, 2020

@Mano3 thanks,

How were the distances per resolution picked?
How was the time delta picked?

  • The distances by resolution were arbitrarily picked for now, until we'll learn what the best values are.
  • The same for time delta, and bounding box

Why taking the first waze alert in the bounding box instead of the closest?

  • We didn't take the closest, because it's not necessarily the right one (for example if the resolution is "עיר" then the geo_location we got from google is just the center of the city, so choosing the closest waze accident doesn't mean it's the right one..but mostly we had hoped that in real data, most times we won't get more than one matching waze accidents (in the time range we will decide to query)

Also, there is no need to query Waze alert from the news flash datetime minus delta until now. News flash will probably appear after the event is already finished so at most it should query up until news flash datetime + delta.

  • Good point, I'll fix that 👍

Was this logic checked on a large chunk of existing news flash? .......

  • No, we didn't test it yet, after talking with @atalyaalon, we thought it'll be a good idea to test this with previous data (hopefully soon, when waze data backfill will be fixed)
    Do you have an idea to test this, other then going manually over the data and see if the chosen waze location was accurate according to the newsflash description? if not, then I guess that's what I'll do once waze data backfill is in

@atalyaalon atalyaalon marked this pull request as draft March 31, 2022 19:51
@atalyaalon atalyaalon marked this pull request as draft March 31, 2022 19:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants